Syntactic{prosodic Labeling of Large Spontaneous Speech Data{bases Syntactic{prosodic Labeling of Large Spontaneous Speech Data{bases
نویسندگان
چکیده
Das diesem Bericht zugrundeliegende Forschungsvorhaben wurde mit Mitteln des Bundesministers f ur Bildung, Wissenschaft, Forschung und Technologie unter dem F orderkennzeichen 01 IV 102 F/4 und 01 IV 102 H/0 gef ordert. Die Verantwortung f ur den Inhalt dieser Arbeit liegt bei den Autoren. ABSTRACT In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German Verb-mobil project (automatic speech{to{speech translation), we developed a syntactic-prosodic labeling scheme where two main types of boundaries (major syntactic boundaries and syntactically ambiguous boundaries) and some other special boundaries are labeled for a large Verbmobil spontaneous speech corpus. We compare the results of classiiers (multi-layer perceptrons and language models) trained on these syntactic{prosodic boundary labels with classiiers trained on perceptual{prosodic and pure syntactic labels. The main advantage of the rough syntactic{prosodic labels presented in this paper is that large amounts of data could be labeled within a short time. Therefore, the classiiers trained with these labels turned out to be superior (recognition rates of up to 96%).
منابع مشابه
Syntactic-prosodic labeling of large spontaneous speech data-bases
In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large data-bases are necessary. For the GermanVerbmobil project (automatic speech{to{speech translation), we developed a syntactic-prosodic labeling scheme wh...
متن کاملPhiladelphiaSYNTACTIC { PROSODIC LABELING OF LARGE SPONTANEOUSSPEECH DATA {
In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German Verb-mobil project (automatic speech{to{speech translation), we developed a syntactic-prosodic labeling scheme w...
متن کاملM = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases
In automatic speech understanding, division of continuous running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistical models for prosodic boundaries large databases are necessary. For the German Verbmobil (VM) project (automatic speech-to-speech translation), we developed a syntactic±prosodic labelling scheme ...
متن کاملImproving parsing of spontaneous speech with the help of prosodic boundaries
Parsing can be improved in automatic speech understanding if prosodic boundary marking is taken into account, because syntactic boundaries are often marked by prosodic means. Because large databases are needed for the training of statistical models for prosodic boundaries, we developed a labeling scheme for syntactic{prosodic boundaries within the German Verbmobil project (automatic speech{to{s...
متن کاملProsody in a corpus of French spontaneous speech: perception, annotation and prosody ~ syntax interaction
Our study focuses on the issue of prosodic annotation and of the prosody ~ syntax interface in conversation and is based on a large corpus of conversational speech in French. The results of inter-transcriber agreement tests show that two expert transcribers are consistent in their labeling of prosodic phrasing and the consistency is well above the chance. A qualitative analysis reveals transcri...
متن کامل